The genome-wide mutational consequences of DNA hypomethylation

DNA methylation is important for establishing and maintaining cell identity and for genomic stability. This is achieved by regulating the accessibility of regulatory and transcriptional elements and the compaction of subtelomeric, centromeric, and other inactive genomic regions. Carcinogenesis is accompanied by a global loss in DNA methylation, which facilitates the transformation of cells. Cancer hypomethylation may also cause genomic instability, for example through interference with the protective function of telomeres and centromeres. However, understanding the role(s) of hypomethylation in tumor evolution is incomplete because the precise mutational consequences of global hypomethylation have thus far not been systematically assessed. Here we made genome-wide inventories of all possible genetic variation that accumulates in single cells upon the long-term global hypomethylation by CRISPR interference-mediated conditional knockdown of DNMT1. Depletion of DNMT1 resulted in a genomewide reduction in DNA methylation. The degree of DNA methylation loss was similar to that observed in many cancer types. Hypomethylated cells showed reduced proliferation rates, increased transcription of genes, reactivation of the inactive X-chromosome and abnormal nuclear morphologies. Prolonged hypomethylation was accompanied by increased chromosomal instability. However, there was no increase in mutational burden, enrichment for certain mutational signatures or accumulation of structural variation to the genome. In conclusion, the primary consequence of hypomethylation is genomic instability, which in cancer leads to increased tumor heterogeneity and thereby fuels cancer evolution.

www.nature.com/scientificreports/ pericentric heterochromatin, LINE elements and subtelomeric regions 21,22 . Tumorigenesis is generally accompanied by a global decrease in DNA methylation 23,24 , which leads to the concomitant deprotection of these heterochromatic regions. Pericentromeric heterochromatin, LINE elements and subtelomeric regions are frequently affected in cancer [25][26][27][28] , indicating that the cancer-associated changes to the DNA methylation landscape coincide with genome instability. Thus, the epigenetic instability of transforming cells may interfere with the protective role of DNA methylation in maintaining genomic integrity, leading to genomic instability. However, the genetic-epigenetic relationship is complex and our knowledge on the genomic consequences of the epigenetic reprogramming that coincide with cancer development is far from complete. To improve our view on cancer evolution and tumor heterogeneity we need a better understanding of how reprogramming of the epigenetic software influences the genetic hardware. In recent years, whole genome sequencing of cancer genomes has increasingly been used for the identification of mutational signatures, distinct patterns of mutation accumulation that provide insight into past mutational processes, such as DNA repair deficiencies, endogenous mutational processes, and exposure to exogenous mutagens such as tobacco-smoking, UV-light or anti-cancer treatments 29 . Over 100 mutational signatures have been described that involve single, double and clustered base substitution signatures and indel signatures 30,31 . In addition, sixteen signatures have been identified that are based on patterns of structural variation such as deletions, tandem duplications, and inversions 32 . The etiology for many mutational signatures is still unknown.
In spite of the myriad genomic consequences that may follow DNA methylation loss, the genomic consequences of long-term DNA hypomethylation have thus far not been systematically characterized and no mutational signatures have been linked to DNA methylation loss. In this study, we aimed to fully characterize all genomic consequences of global methylation loss; from the whole chromosome level down to single base resolution. We performed CRISPR/CAS9 mediated conditional knockdown of DNMT1 and examined the genetic consequences after prolonged culture by whole genome sequencing at 30X coverage of expanded clones to capture events at high resolution affecting few or single bases and by single cell DNA sequencing to capture chromosome scale events.

Results
For these experiments we made use of a human female TERT immortalized retinal pigment epithelial cell line (RPE-1) 33,34 , which lacks a transformed phenotype and is near diploid. This is a valuable and powerful cell line for the study of DNA damage, genomic instability, and mutational signature analysis, because RPE-1 cells can be cultured long-term, while maintaining a stable karyotype with a modal chromosome number of 46 [35][36][37] . Because of its key role in maintenance of methylation, loss of DNMT1 activity should lead to a progressive loss of DNA methylation in cycling cells. In most cell lines, DNMT1 is an essential gene 38,39 . In line with these observations, our attempts to create a DNMT1 knockout in RPE-1 cells failed, while knockouts for non-essential genes were successful (data not shown). Therefore, we decided to use CRISPR interference (CRISPRi) to knock down DNMT1 gene expression 40 . In CRISPRi, a nuclease dead version of Cas9 (dCAS9) fused to the transcriptional inhibitor KRAB 41 (dCAS9-KRAB) is guided to the transcriptional start site by a sgRNA, thereby blocking the transcription machinery and resulting in decreased expression of the gene of interest. In combination with a doxycycline inducible sgRNA, CRISPRi allows temporal downregulation in full isogenic cell lines.
We established clonal hTERT RPE-1 dCAS9-KRAB doxycycline inducible sgRNA DNMT1 cell lines (hereafter referred to as DNMT1 knockdown cells). The lines were P53 -/knock-out ( Supplementary Fig. 1a,b) to increase the permissiveness of cells for all types of mutations and avoid loss of such events due to P53-induced senescence or apoptosis. Three independent clonal DNMT1 knockdown lines showed clear upregulation of the sgRNA upon addition of doxycycline (Fig. 1a). All next experiments were performed with clonal line 2, which showed robust and long-term downregulation of DNMT1 at the mRNA and protein levels upon sgRNA induction ( Fig. 1a-c, Supplementary Fig. 2). Methylation arrays covering over 850,000 sites with extensive coverage of CpG islands, genes, and enhancers revealed a genome wide reduction in methylation levels in DNMT1 knockdown cells in three independent experiments (Fig. 1d).Genome-wide, the mean degree of methylation was reduced by 10% upon DNMT1 inhibition, while regions with β-values > 0.5 showed a reduction of 16% in methylation levels. These reductions in DNA methylation levels are similar to or even larger than has been previously described in a wide variety of solid tumors, including colorectal, lung, and breast cancer that mostly show reductions of around 10% 24,42 . Therefore, our DNMT1 knockdown cell line is a highly suitable model to study if the loss in methylation as observed in cancer can induce mutations or genomic instability. Methylation loss was independent of genomic location and observed in genes, promoters and CpG islands (Fig. 1d,e).
Prolonged downregulation of DNMT1 delayed growth by approximately 50%, without leading to a full cell cycle arrest (Fig. 2a), which is in line with previous observations 19,20 . Further characterization of DNMT1 knockdown cells by RNA seq revealed that global hypomethylation resulted in a relative increase in gene activity. In total, 501 genes were up-and 73 genes were downregulated (p adj < 0.05) in DNMT1 knockdown cells. Reassuringly, DNMT1 was among the down regulated genes, while other writers and erasers of DNA methylation were not differentially expressed between both conditions (Fig. 2b,c). Strikingly, more genes on the X-chromosome were upregulated than expected, indicating partial reactivation of the inactive X-chromosome (Fig. 2d).
To further examine the correlation between DNA methylation and gene expression, we examine the DNA methylation status at promoter regions and gene bodies of genes that were differentially expressed. Promoter regions of upregulated genes were hypermethylated in control cells and showed a significant reduction of 13% in methylation status upon DNMT1 knockdown (Fig. 2e), which is in line with the repressive nature of promoter methylation 43 . In contrast, promoter regions of downregulated genes showed much lower methylation levels in control cells when compared to upregulated genes. Also the reduction in DNA methylation was less pronounced www.nature.com/scientificreports/ www.nature.com/scientificreports/ (7.5%). In gene bodies, DNA methylation was reduced in both the upregulated gene set as well as the downregulated geneset, with respectively 12% and 11% (Fig. 2e). Doxycycline treated cells acquired an abnormal kidney-shaped nuclear morphology, indicating that DNA methylation loss disturbs global genome organization (Fig. 2f).
CRISPRi mediated knockdown of DNMT1 in RPE-1 cells is a relatively clean method to reduce DNA methylation compared to alternative methods such as short hairpin mediated knockdown or chemical inhibition that have more risk of adverse side-effects unrelated to the loss of DNA methylation and are generally restricted to short-term consequences [44][45][46] . Together, this conditional DNMT1 knockdown cell line is a powerful tool to study the long-term consequences of DNA methylation loss.
To examine the genomic consequences of global DNA methylation loss in single cells at nucleotide resolution we performed prolonged downregulation of DNMT1 (6 weeks) to allow putative genetic changes to occur, followed by single clone expansion. Whole genome sequencing at 30X coverage of expanded clones followed by computational analysis enables the identification of all types of genetic variation, including single base substitutions (SBSs) and indels that are private to each clone. We previously applied this highly sensitive method to identify and characterize the mutational signatures in individual cells in healthy, diseased, and perturbed conditions [47][48][49][50][51][52][53] . Loss of DNA methylation has previously been associated with mismatch repair (MMR) deficiency 54 and therefore we anticipated to find an elevated mutational burden as well as an increase in the contribution of MMR-related mutational signatures. In contrast to our expectations we did not observe an increase in SBS burden (Student's t-test, p value = 0.18; Fig. 3a) or in the number of indels (Student's t-test, p value = 0.33; Fig. 3b), although numbers tended to be lower in the hypomethylated clones when compared to the control condition, possibly reflecting the difference in proliferation rates (Fig. 2a). The mutational spectrum (Fig. 3c) and 96 mutational profiles (Fig. 3d-e) were highly similar between hypomethylated cells and control cells, indicating that reduced methylation did not lead to a shift in the activity of mutational processes. Notably, in this study we relied on a sample size set (n = 3 per condition) similar to previous studies that also used manipulated tissue culture systems and which has been adequate for the detection of differences in mutation rates and to identify induced mutational patterns 47,50,55 . Structural variant (SVs) analysis revealed that most SVs were shared between clones of both conditions indicating these SVs were acquired prior to DNMT1 knockdown (Fig. 3f). Notable exception was a chromothripsis event that affected the q-arm of Chromosome 19, which was only present in two of the three clones of the hypomethylated cells (Fig. 3f).
Chromothripsis has mechanistically been linked to the missegregation of chromosomes or chromosome arms followed by damage and non-homologous repair in micronuclei 56 . The observed chromothripsis event may therefore be the result of increased chromosomal instability (CIN), as has been described previously for hypomethylated cells 57 . To further investigate CIN, we performed single cell DNA sequencing (Fig. 4a). Doxycycline treatment for 6 weeks of DNMT1 knockdown cells resulted in increased aneuploidy and higher cellular heterogeneity (Fig. 4b). We reasoned that the increased chromosomal instability was the result of the loss of DNA methylation in peri-centromeric regions. However, the repetitive nature of these regions precludes assessment of DNA methylation by DNA methylation arrays. Therefore, we decided to perform long-read Nanopore sequencing of peri-centromeres, which allows both the direct measurement of methylated cytosines as well as unique mapping to the telomere-to-telomere reference genome 58 . Similar to the genomic regions that were assessed by methylation arrays, pericentromeric regions of doxycycline-treated cells showed a consistent reduction in DNA methylation levels (Fig. 4c).
Discussion and Conclusion. Ample evidence has established that DNA methylation is required for genomic integrity [15][16][17][18][19][20] . However, DNA methylation is associated with many different genomic elements providing different routes towards genomic instability upon methylation loss 59 . Reduced methylation can destabilize pericentromeric heterochromatin inducing rearrangements of chromosome arms, as has been described in for example Wilms tumors and hepatocellular carcinoma 60,61 , and upon chemical interference of cell lines with the DNA hypomethylating drug 5-aza-2'-deoxycytidine 20 . Additionally, loss of methylation at repetitive elements may lead to genomic instability through secondary DNA structures that cause replication stress, DNA breaks and recombination between repeats 59 . Methylation loss can also lead to derepression of transposable elements that in turn can promote genomic rearrangements 26,62,63 . A decrease in methylation could also lead to more transcription and formation of R-loops, which are a source of replication stress, DNA breaks and genome instability 64 . Furthermore, reduced methylation at subtelomeric regions could lead to telomere instability 65 . Finally, DNA methylation has been implicated in DNA mismatch-repair 54 and the absence of DNA methylation may lead to mild MMR-deficiency. Because of these possible effects of DNA methylation loss, we hypothesized that experimentally induced DNA demethylation would lead to a broad spectrum of genetic variation, ranging from SBSs, to indels, and from SVs to chromosomal aberrations. With a systematic approach, using highly sensitive methods to measure all types of genetic variation in individual cells 47,49,66 , we identified chromosomal instability as the foremost mutational consequence of DNA methylation loss.
There are several possible explanations for why we did not observe other types of genetic variation. In our model, DNA methylation loss was incomplete, so it is possible that more genetic events beyond chromosomal instability will occur if DNA methylation is further reduced, although this would likely severely impact on viability of the cells and may not be physiologically relevant in the context of cancer. Additionally, the hypomethylation effects may be cell-type specific. For example, the RPE-1 cell line that we used was immortalized by overexpression of hTERT, which may counteract any destabilization of telomeres caused by DNA methylation loss. Other cell types may be more sensitive to DNA hypomethylation in peri-telomeric regions. The effects of DNA methylation loss may also be context dependent and require additional circumstances. For example, global DNA demethylation may derepress transposons, but these may only become active when accompanied by the loss of repressive histone modifications and when post-transcriptional and post-translation defense mechanisms www.nature.com/scientificreports/ that suppress their mobility also fail. The mutational patterns induced by DNA methylation loss may also be too subtle to be detected above the background of more common mutational processes, e.g. those induced by culturing conditions 47 . Taken together, while our observations do not support an important role for DNA methylation in mutational processes other than CIN, subtle effects may be detected by capturing more variants, for example by extending the culture period of hypomethylated cells or by the analysis of vast amounts of clones expanded from single cells, which would be a very costly endeavor. The exact mechanism through which hypomethylation causes CIN is as yet unknown. Proposed non-exclusive mechanisms include increased recombination of centromeric repeats, increased DNA breaks in centromeres, dysregulation of the centromeric protein network, increased transcription of α-satellite transcripts, defective www.nature.com/scientificreports/ assembly of centromeres/kinetochores and premature cohesion loss 22,32 . Future work may uncover the mechanistic link between pericentromeric hypomethylation and chromosomal instability. In conclusion, we observed that the degree of hypomethylation that can be observed in many cancer types primarily promotes CIN, without inducing other mutational processes. Hypomethylation induced CIN may promote tumor heterogeneity and cancer evolution.

Molecular cloning and virus production.
We replaced the shRNA scaffold of pLV.FUTG.Tet.Inducible.
shRNA with the sgRNA scaffold from lenticrispr v2 67 , by InFusion cloning (Takara). Next, we annealed oligonucleotides for the guideRNA of DNMT1 (Top: CAC CGG GTA CGC GCC GGC ATC TCG G; Bottom: AAA CCC GAG ATG CCG GCG CGT ACC C) and subsequently ligated the product into BsmbI digested pLV.TETi.sgRNA to www.nature.com/scientificreports/ obtain pLV.TETi.sgDNMT1. Selection of this sgRNA sequence was predicted to be highly effective for CRISPR interference based on chromatin, position, and sequence features 40 . To make RPE-1 cells more permissive to the accumulation of genetic variation, TP53 knock-outs were generated with sgRNAs targeting the gene sequence 5′-GGG CAG CTA CGG TTT CCG TC-3′. The annealed DNA oligonucleotides were cloned in px458 that also contains Cas9 followed by a 2A-EGFP sequence 68 . For virus production, lentiviral plasmids carrying the transgenes of interest were co-transfected with pRSV-REV, pMD2g and pMDLG-pRRE in HEK293T cells. Virus was harvested 3 days post-transfection and collected for immediate use or concentrated using Lenti-X Concentrator solution and stored at -80. Concentrated or undiluted virus was added to RPE-1 cells in a dilution series together with 4 µg/ml polybrene. Successfully transduced cells were subsequently selected with the appropriate mammalian selection marker.

Cell culture
hTERT RPE1 cells (ATCC, CRL-4000) and HEK239T cells were cultured in DMEM, 10% fetal bovine serum, 1% penicillin, 1% streptomycin at 37 °C, and 5% CO2. RPE1 TP53 -/cells were transduced with lentivirus carrying lenti.EF1a.dCas9.KRAB.Puro 41 followed by transduction with pLV.TETi.sgDNMT1 lentivirus. RPE1-p53ko-dCas9-KRAB-Teti-DNMT1 cells were cultured with 10 ug/ml puromycin for selection of the dCas9-KRAB plasmid, 10 µg/ml blasticidin for selection of guide RNA plasmid. Clonal cell lines were established by limiting dilution series. The expression of the guideRNA was induced by stimulation with doxycyclin 2 µg/ml, which was refreshed every 2-3 days. Cell counts were performed every week and 5.3 × 10E + 3 cells/cm 2 were seeded. Cell counts were multiplied by split ratio to correct for passage. To establish hTERT-RPE1 CRISPR knock-out cell lines, cells were seeded in a 10 cm dish and transfected with vectors encoding both Cas9 and sgRNA target sequence using an Amaxa Nucleofector II instrument (Lonza). Forty-eight hours after transfection single GFP + cells were sorted into 96-wells plates on FACS ARIA II/III Flow Cytometer (BD Biosciences) and expanded. Clones were tested for genome editing with PCR and Sanger sequencing and analyzed with the ICE CRISPR tool 69 .
Staining and microscopy. Cell culture medium was removed and 1 ml of staining solution, consisting of 2.5 uM SYTO 11 Green Fluorescent Nucleic Acid Stain (Invitrogen) in cell culture medium was added to each well of a 6-well plate and cells were incubated at 37 °C for 60 min. Cells were imaged using an EVOS M5000 Imaging System at 10X magnification.
Single cell DNA sequencing. Single cell sequencing was performed by the Single Cell Sequencing Core facility at the Hubrecht Institute. In short, cells were cultured with doxycycline for 6 weeks were resuspended and incubated with 2 ml of staining solution, consisting of 5 ug/ml Hoechst 34,580 in cell culture medium. Cell pellet was resuspended in PBS. Cells were FACS sorted in 384-well plates and lysis of individual cells was performed for 2 h at 55 °C using Proteinase K (Ambion) in 1 × Cutsmart (New England Biolabs) followed by heat inactivation at 80 °C for 10 min. The genomic DNA was subsequently fragmented with 100 nl NLAIII (R0125L, New England Biolabs) in 1 × Cutsmart (New England Biolabs) for 2 h at 37 °C followed by heat inactivation at 65 °C for 20 min. Then, 50 nl of 50 mM barcoded double-stranded NLAIII adapters and 400 nl of 40 U T4 DNA ligase (New England Biolabs) in 1 × T4 DNA ligase buffer (New England Biolabs) supplemented with 10 mM ATP (Invitrogen) was added to each well and ligated overnight at 16 °C. Libraries were sequenced on an Illumina Nextseq 2000 with 1 × 50 bp double-end sequencing. The fastq files were mapped to GRCH38 using the Burrows-Wheeler aligner. The mapped data were further analyzed using custom scripts in Python, which parsed for library barcodes, removed reads without a NlaIII sequence and removed PCR-duplicated reads. Copy number analysis was performed as described previously 66  Nanopore sequencing of centromeres. For ONT sequencing of centromeres, pericentromeric regions were isolated by AlphaHOR-RES (alpha higher-order repeat restriction and enrichment by size) 73 . In short, genomic DNA was extracted from ~ 25 million cells using an NEB High Molecular Weight DNA extraction kit followed by elution. Fully solubilized DNA was digested with MscI, and AseI. Digested DNA was loaded onto a 0.3% TAE agarose gel and run at 2 V/cm for 1 h. Fragments larger than 20 kb were purified using a Zymoclean Large Fragment DNA Recovery Kit. DNA was subsequently prepared for DNA sequencing using an ONT native library prep kit (LSK-109) and sequenced on a MinION (r9) flow cell. Nanopore raw data was basecalled using Guppy v5.0.17, and subsequently mapped against the CHM13 draft version 1.1 using minimap2 (map-ont settings). Reads were filtered for a mapping quality > = 10. Methylation was then called using nanopolish 0.11.1 using the nanopolish call-methylation command. For binarized methylation calls, a cutoff of 2 was used so that scores > 2 are interpreted as methylated and scores < -2 are interpreted as unmethylated.
Sequencing and data analysis.. For whole genome sequencing, DNA was isolated from cell pellets with the Qiasymphony (Qiagen) DNA isolation method and the Illumina TruSeq Nano DNA Library Prep Kit was used for library preparation. Samples were sequenced on HiSeq Xten or NovaSeq6000 platforms (Illumina) with 30 × coverage. All samples were analysed with the HMF pipeline V4.8 (https:// github. com/ hartw igmed ical/ pipel ine) which was locally deployed using GNU Guix with the recipe from https:// github. com/ UMCUG eneti cs/ guix-addit ions. Full pipeline description is explained in 74 , and details and settings of all the tools can be found at their Github page. Briefly, sequence reads were mapped against the human reference genome GRCh37 using Burrows-Wheeler Alignment (BWA-MEM) v0.7.5a 75 . Subsequently, somatic single base substitutions (SBSs), double base substitutions (DBSs) and small insertions and deletions (INDELS) were determined by Strelka v1.0.14 76 that are further annotated by PURPLE. PURPLE (v2.53) combines B-allele frequency (BAF) from AMBER (v3.3), read depth ratios from COBALT (v1.7), and structural variants from GRIDSS 77 to estimate copy number profiles, variant allele frequency (VAF), variant clonality and microhomology context at the breakpoints. To obtain high-quality somatic mutations that can be attributed to in vivo mutagenesis in the ASC clones, we only considered somatic mutations with a PURPLE derived variant allele frequency higher than 30% as mutations that fall outside this range were potentially induced in vitro after the clonal passage. Analysis of the SVs was based on the LINX (v1.26) 78 output which interprets and annotates simple and complex SV events from PURPLE and GRIDSS output. LINX chains individual SVs into SV clusters and classifies these clusters into various event types. Clusters can have one SV (for simple events such as deletions and duplications which all have 1 clusterId), or multiple SVs, with ClusterId > 1 and here considered as complex SV. We defined SV load as the total number of simple SV events. We quantified deletions and duplications (ResolvedType is 'DEL' or 'DUP') stratified by length (1-10 kb, 10-100 kb, 100 kb-1 Mb, 1-10 Mb, > 10 Mb). For complex SVs, we included "Com-plex_SV", |"Complex_DEL ", "RECIP_INV" and "RECIP_TRANS" under resolved_Type annotation feature.
Mutation burden analysis. The SBS, DBS and indel mutations were parsed from PURPLE vcfs by our developed R package Mutational Patterns 79 that was recently updated with DBS and indel functionality as well as COSMIC compatibility 80 . For each mutation type, we defined mutation burden as the total number of mutations of the autosomal genome.

Data availability
The datasets generated and analysed during the current study are available in the EGA repository, EGAS00001006845. www.nature.com/scientificreports/